Transmitting over a network

ABSTRACT

Data for presentation in real time, such as a video or audio sequence, is available on different encoded versions having different degrees of compression. In order to assess, during transmission of one version, the feasibility of switching to another version, given the data rate known to be available at the time, a server computes, for a candidate version, in respect of at least one portion thereof that has not yet been sent, the maximum value of a timing error that would occur if any number of portions starting with that portion to be sent at the available rate. The selection of the same or a different version for continuing transmission is taken in dependence on a comparison between the computed error and the current state of a receiving buffer. Error values may be computed in advance for a range of transmitting rates, stored and later retrieved for use in estimating an error value corresponding to the actual transmitting rate.

The present invention is concerned with methods and apparatus fortransmitting encoded video, audio or other material over a network.

According to one aspect of the present invention there is provided amethod of transmitting an encoded sequence over a network to a terminal,comprising: storing a plurality of encoded versions of the samesequence, wherein each version comprises a plurality of discreteportions of data and each version corresponds to a respective differentdegree of compression; transmitting a current one of said versions;ascertaining the data rate permitted by the network; ascertaining thestate of a receiving buffer at the terminal; for at least one candidateversion, computing in respect of at least one discrete portion thereofas yet unsent the maximum value of a timing error that would occur wereany number of portions starting with that portion to be sent at thecurrently ascertained permitted rate; comparing the determined maximumerror values with the ascertained buffer state; selecting one of saidversions for transmission, in dependence on the results of saidcomparisons; and transmitting the selected version.

In another aspect, the invention provides a method of transmitting anencoded sequence over a network to a terminal, comprising: storing aplurality of encoded versions of the same sequence, wherein each versioncomprises a plurality of discrete portions of data and each versioncorresponds to a respective different degree of compression; for eachversion and for each of a plurality of nominal transmitting rates,computing in respect of at least one discrete portion thereof themaximum value of a timing error that would occur were any number ofportions starting with that portion to be sent at the respective nominalrate; storing said maximum error values; transmitting a current one ofsaid versions; ascertaining the data rate permitted by the network;ascertaining the state of a receiving buffer at the terminal; for atleast one candidate version, using the ascertained permitted data rateand the stored maximum error values to estimate a respective maximumerror value corresponding to said ascertained permitted data rate;comparing the estimated maximum error values with the ascertained bufferstate; selecting one of said versions for transmission, in dependence onthe results of said comparisons; and transmitting the selected version.

Further aspects of the invention are set out in the claims

Some embodiments of the invention will now be described, by way ofexample, with reference to the accompanying drawings, in which:

FIG. 1 is a block diagram of a transmission system embodying theinvention;

FIG. 2 is a timing diagram; and

FIG. 3 is a flowchart explaining the operation of the control unit shownin FIG. 1.

In FIG. 1, a streamer 1 contains (or has access to) a store 11 in whichare stored files each being a compressed version of a video sequence,encoded using a conventional compression algorithm such as that definedin the ITU standard H.261 or H.263, or one of the ISO MPEG standards.More particularly, the store 11 contains, for the same original videomaterial, several files each encoded with a different degree ofcompression. In practice all the material could if desired be stored inone single file, but for the purposes of description they will beassumed to be separate files. Thus FIG. 1 shows three such files: V1,encoded with a high degree of compression and hence low bit-rate,representing a low-quality recording; V2, encoded with a lesser degreeof compression and hence higher bit-rate, representing a medium-qualityrecording; and V3, encoded with a low degree of compression and henceeven higher bit-rate, representing a high-quality recording. Naturallyone may store similar multiple recordings of further video sequences,but this is not important to the principles of operation.

By “bit-rate” here is meant the bit-rate generated by the originalencoder and consumed by the ultimate decoder; in general this is not thesame as the rate at which the streamer actually transmits, which will bereferred to as the transmitting bit-rate. It should also be noted thatthese files are generated at a variable bit-rate (VBR)—that is, thenumber of bits generated for any particular frame of the video dependson the picture content. Consequently, references above to low (etc.)bit-rate refer to the average bit-rate.

The server has a transmitter 12 which serves to output data via anetwork 2 to a terminal 3. The transmitter is conventional, perhapsoperating with a well known protocol such as TCP/IP. A control unit 13serves in conventional manner to receive requests from the terminal fordelivery of a particular sequence, and to read packets of data from thestore 11 for sending to the transmitter 12 as and when the transmitteris able to receive them. Here it is assumed that the data are read outas discrete packets, often one packet per frame of video, though thepossibility of generating more than one packet for a single frame is notexcluded. (Whilst is in principle possible for a single packet tocontain data for more than one frame, this is not usually of muchinterest in practice).

Note that these packets are not necessarily related to any packetstructure used on the network 2.

The terminal 3 has a receiver 31, a buffer 32, primarily foraccommodating short-term fluctuations in network delay and throughput,and a decoder 33. In principle, the terminal is conventional, though toget full benefit from the use of the server, one might choose to use aterminal having a larger buffer 32 than is usual.

Some networks (including TCP/IP networks) have the characteristic thatthe available transmitting data rate fluctuates according to the degreeof loading on the network. The reason for providing alternative versionsV1, V2, V3 of one and the same video sequence is that one may choose aversion that the network is currently able to support. Another functionof the control unit 13, therefore, is to interrogate the transmitter 12to ascertain the transmitting data rate that is currently available, andtake a decision as to which version to send. Here, as in many suchsystems, this is a dynamic process: during the course of a transmissionthe available rate is continually monitored so that as conditionsimprove (or deteriorate) the server may switch to a higher (or lower)quality version. Sometimes (as in TCP/IP) the available transmittingrate is not known until after transmission has begun; one solution isalways to begin by sending the lowest-rate version and switch up if andwhen it becomes apparent that a higher quality version can beaccommodated.

Some systems employ additional versions of the video sequencerepresenting transitional data which can be transmitted between thecessation of one version and the commencement of a different one, so asto bridge any incompatibility between the two versions. If required,this may be implemented, for example, in the manner described in ourU.S. Pat. No. 6,002,440.

In this description we will concentrate on the actual decision on if andwhen to switch. Conventional systems compare the available transmittingbit-rate with the average bit-rates of the versions available fortransmission. We have recognised, however, that this is unsatisfactoryfor VBR systems because it leaves open the possibility that at some timein the future the available transmitting bit-rate will be insufficientto accommodate short-term fluctuations in instantaneous bit-rate as thelatter varies with picture content. Some theoretical discussion is inorder at this point.

As shown in FIG. 2, an encoded video sequence consists of N packets.Each packet has a header containing a time index t_(i) (i=0 . . . N−1)(in terms of real display time—e.g. this could be the video framenumber) and contains b_(i) bits. This analysis assumes that packet imust be completely received before it can be decoded (i.e. one mustbuffer the whole packet first).

In a simple case, each packet corresponds to one frame, and thetime-stamps t_(i) increase monotonically, that is, t_(i+1)>t_(i) for alli. If however a frame can give rise to two or more packets (each withthe same t_(i)) then t_(i+1)≧t_(i). If frames can run out ofcapture-and-display sequence (as in MPEG) then the t_(i) do not increasemonotonically. Also, in practice, some frames may be dropped, so thatthere will be no frame for a particular value of t_(i).

These times are relative. Suppose the receiver has received packet 0 andstarts decoding packet 0 at time t_(ref)+t₀. At “time now” oft_(ref)+t_(g) the receiver has received packet t_(g) (and possibly morepackets too) and has just started to decode packet g.

Packets g to h−1 are in the buffer. Note that (in the simple case) ifh=g+1 then the buffer contains packet g only. At time t_(ref)+t_(j) thedecoder is required to start decoding packet j. Therefore, at that timet_(ref)+t_(j) the decoder will need to have received all packets up toand including packet j.The time available from now up to t_(ref)+t_(j) is (t _(ref) +t _(j))−(t_(ref) +t _(g))=t _(j) −t _(g).   (1)

The data to be sent in that time are that for packets h to j, viz.$\begin{matrix}{\sum\limits_{i = h}^{j}b_{i}} & (2)\end{matrix}$which at a transmitting rate R will require a transmission duration$\begin{matrix}\frac{\sum\limits_{i = h}^{j}b_{i}}{R} & (3)\end{matrix}$

This is possible only if this transmission duration is less than orequal to the time available, i.e. when the currently availabletransmitting rate R satisfies the inequality $\begin{matrix}{\frac{\sum\limits_{i = h}^{j}b_{i}}{R} \leq {t_{j} - t_{g}}} & (4)\end{matrix}$

Note that this is the condition for satisfactory reception and decodingof frame j: satisfactory transmission of the whole of the remainingsequence requires that this condition be satisfied for all j=h . . .N−1.

For reasons that will become apparent, we rewrite Equation (4) as:$\begin{matrix}{{\frac{\sum\limits_{i = h}^{j}b_{i}}{R} - \left( {t_{j} - t_{h - 1}} \right)} \leq {t_{h - 1} - t_{g}}} & (5)\end{matrix}$

Note that${t_{j} - t_{h - 1}} = {{\sum\limits_{i = h}^{j}\left( {t_{i} - t_{i - 1}} \right)} = {\sum\limits_{i = h}^{j}{\Delta\quad t_{i}}}}$where Δ  t_(i) = t_(i) − t_(i − 1).

Also, we define Δε_(i)=(b_(i)/R)−Δt_(i)

and T_(B)=t_(h−1)−t_(g); note that T_(B) is the difference between thetime-stamp of the most recently received packet in the buffer and thetime stamp of the least recently received packet in the buffer—i.e. theone that we have just started to decode. Thus, T_(B) indicates theamount of buffered information that the client has at time t_(g).

Then the condition is $\begin{matrix}{{\sum\limits_{i = h}^{j}{\Delta\quad ɛ_{i}}} \leq T_{B}} & (6)\end{matrix}$

For a successful transmission up to the last packet N−1, this conditionmust be satisfied for any possible j, viz. $\begin{matrix}{{{Max}_{j = h}^{j = {N - 1}}\left\{ {\sum\limits_{i = h}^{j}{\Delta\quad ɛ_{i}}} \right\}} \leq T_{B}} & (7)\end{matrix}$

The left-hand side of Equation (7) represents the maximum timing errorthat may occur from the transmission of packet h up to the end of thesequence, and the condition states, in effect that this error must notexceed the ability of the receiver buffer to accommodate it, given itscurrent contents. For convenience, we will label the left-hand side ofEquation (7) as T_(h)—i.e. $\begin{matrix}{T_{h} = {{Max}_{j = h}^{j = {N - 1}}\left\{ {\sum\limits_{i = h}^{j}{\Delta\quad ɛ_{i}}} \right\}}} & (8)\end{matrix}$

So that Equation (7) may be written asT_(h)≦T_(B)   (9)

In practice we prefer to allow switching only at certain defined“switching points” in the sequence (and naturally provide thetransitional data mentioned earlier only for such points). In that casethe test needs to be performed only at such points.

FIG. 3 is a flowchart showing operation of the control unit 13 followingselection of a video sequence for transmission. At step 100, a version,such as V1, is selected for transmission. The currently selected versionnumber is stored. At step 101 a frame counter is reset. Then (102) thefirst frame (or on subsequent iterations, the next frame) of thecurrently selected version, is read from the store 11 and sent to thetransmitter 12. Normally, the frame counter is incremented at 103 andcontrol returns to step 102 where, as soon as the transmitter is readyto accept it, a further frame is read out and transmitted. If, howeverthe frame is designated as a switching frame the fact that it contains aflag indicating this is recognised at step 104.

The switching decision at frame h may then proceed as follows:

Step 105: interrogate the transmitter 12 to determine the availabletransmitting rate R;

Step 106: ascertain the current value of T_(B): this may be calculatedat the terminal and transmitted to the server, or may be calculated atthe server (see below);

Step 107: compute (for each file V1, V2, V3) T_(h) in accordance withEquation (8)—let these be called T_(h)(1), T_(h)(2), T_(h)(3);

Step 108: determine the highest value of k for which T_(h)(k)+Δ≦T_(B),where Δ is a fixed safety margin;

Step 109: select file V_(k) for transmission.

The original loop is then resumed with step 102 where the next frame istransmitted before, but possibly from a different one of the three filesV1, V2, V3.

The calculation of T_(B) at the server will depend on the exact methodof streaming that is in use. Our preferred method is (as described ourin international patent application no. PCT/GB 01/05246 [Agent's Ref.A26079]) to send, initially, video at the lowest quality, so that theterminal may immediately start decoding whilst at the same time thereceiving buffer can be filling up because data is being sent at ahigher rate than it is used. In this case the server can deduce currentclient session time (i.e. the timestamp of the packet currently beingdecoded at the terminal) without any feedback, and soT _(B)=latest sent packet time−current client session time.

If the system is arranged such that the terminal waits until somedesired state of buffer fullness is reached before playing begins, thenthe situation is not quite so simple because there is an additionaldelay to take into account. If this delay is fixed, it can be includedin the calculation. Similarly, if the terminal calculates when to startplaying and both the algorithm used, and the parameters used by thealgorithm, are known by the server, again this can be taken intoaccount. If however the terminal is of unknown type, or controls itsbuffer on the basis of local conditions, feedback from the terminal willbe needed.

Now, this procedure will work perfectly well, but does involve aconsiderable amount of processing that has to be carried out during thetransmission process. In a modified implementation, therefore, we preferto perform as much as possible of this computation in advance. Inprinciple this involves the calculation of T_(h)(k) for every packetthat follows a switching point, and storing this value in the packetheader. Unfortunately, this calculation (Equation (8) and the definitionof Δε_(i)) involves the value of R, which is of course unknown at thetime of this pre-processing. Therefore we proceed by calculatingT_(h)(k) for a selection of possible values of R, for example (if R_(A)is the average bit rate of the file in question)R ₁=0.5R _(A)R ₂=0.7R _(A)R₃=R_(A)R ₄=1.3R _(A)R ₅=2R _(A)

So each packet h has these five precalculated values of T_(h) stored init. If required (for the purposes to be discussed below) one may alsostore the relative time position at which the maximum in Equation (8))occurs, that is,Δt _(h max) =t _(j max) −t _(h) where t_(j max) is the value of j inEquation 8 for which T_(h) is obtained.

In this case the switching decision at frame h proceeds as follows:

interrogate the transmitter 12 to determine the available transmittingrate R;

ascertain the current value of T_(B), as before;

EITHER—in the event that R corresponds to one of the rates for whichT_(h) has been precalculated—read this value from the store (for eachfile V1, V2, V3);

OR—in the event that R does not so correspond, read from the store thevalue of T_(h) (and, if required, t_(h max)) that correspond to thehighest one (R⁻) of the rates R₁ . . . R₅ that is less than the actualvalue of R, and estimate T_(h) from it (again, for each file V1, V2,V3);

determine the highest value of k for which T_(h)(k)+Δ≦T_(B), where Δ isa fixed safety margin;

select file V_(k) for transmission.

The estimate of T_(h) could be performed simply by using the value T_(h)⁻ associated with R⁻; this would work, but since it would overestimateT_(h) it would result, at times, in a switch to a higher quality streambeing judged impossible even though it were possible. Another optionwould be by linear (or other) interpolation between the values of T_(h)stored for the two values of R₁ . . . R₅ each side of the actual valueR. However, our preferred approach is to calculate an estimate accordingto:$T_{i}^{\prime} = {\frac{\left( {T_{i}^{-} + {\Delta\quad T_{i\quad\max}^{-}}} \right)R^{-}}{R} - {\Delta\quad T_{i\quad\max}^{-}}}$Where R⁻ is the highest one of the rates R₁ . . . R₅ that is less thanthe actual value of R, T_(i) ⁻ is the precalculated T_(h) for this rate,Δt_(i max) ⁻ is the time from t_(i) at which T_(i) ⁻ is obtained (i.e.is the accompanying value of Δt_(h max) ⁻. In the event that this methodreturns a negative value, we set it to zero.

Note that this is only an estimate, as T_(h) is a nonlinear function ofrate. However with this method T_(i)′ is always higher than the truevalue and automatically provides a safety margin (so that the margin Δshown above may be omitted.

Note that these equations are valid for the situation where the encodingprocess generates two or more packets (with equal t_(i)) for one frame,and for the situation encountered in MPEG with bidirectional predictionwhere the frames are transmitted in the order in which they need to bedecoded, rather than in order of ascending t_(i).

The above description assumes that the test represented by Equation (7)is performed for all versions of the stored video. Although preferred,this is not essential. If large jumps in picture quality are notexpected (for example because frequent switching points are provided)then the test could be performed only for the current version and one ormore versions corresponding to adjacent compression rates. For example,when transmitting version V1, it might be considered sufficient toperform the test only for the current version V1 and for the nearestcandidate version V2. Also, in the case of a server that interfaces withdifferent networks, one might choose to test only those versions withdata rate requirements that lie within the expected range of capabilityof the particular network in use.

Although the example given is for encoded video, the same method can beapplied to encoded audio or indeed any other material that is to beplayed in real time.

1. A method of transmitting an encoded sequence over a network to aterminal, comprising storing a plurality of encoded versions of the samesequence, wherein each version comprises a plurality of discreteportions of data and each version corresponds to a respective differentdegree of compression; transmitting a current one of said versions;ascertaining the data rate permitted by the network; ascertaining thestate of a receiving buffer at the terminal; for at least one candidateversion, computing in respect of at least one discrete portion thereofas yet unsent the maximum value of a timing error that would occur wereany number of portions starting with that portion to be sent at thecurrently ascertained permitted rate; comparing the determined maximumerror values with the ascertained buffer state; selecting one of saidversions for transmission, in dependence on the results of saidcomparisons; and transmitting the selected version.
 2. A method oftransmitting an encoded sequence over a network to a terminal,comprising storing a plurality of encoded versions of the same sequence,wherein each version comprises a plurality of discrete portions of dataand each version corresponds to a respective different degree ofcompression; for each version and for each of a plurality of nominaltransmitting rates, computing in respect of at least one discreteportion thereof the maximum value of a timing error that would occurwere any number of portions starting with that portion to be sent at therespective nominal rate; storing said maximum error values; transmittinga current one of said versions; ascertaining the data rate permitted bythe network; ascertaining the state of a receiving buffer at theterminal; for at least one candidate version, using the ascertainedpermitted data rate and the stored maximum error values to estimate arespective maximum error value corresponding to said ascertainedpermitted data rate; comparing the estimated maximum error values withthe ascertained buffer state; selecting one of said versions fortransmission, in dependence on the results of said comparisons; andtransmitting the selected version.
 3. A method according to claim 1 inwhich said maximum timing error determination is performed only forselected ones of said portions at which a version change is to bepermitted.
 4. A method according to claim 1 in which each computedtiming error value is the difference between (a) the time needed totransmit, at the relevant rate, the portion in question and zero or moreconsecutive subsequent portions up to and including any particularportion, and (b) the difference between the playing instant of therespective particular portion and the playing instant of the portionpreceding the portion in question.
 5. A method according to claim 1 inwhich the sequence is a video sequence.
 6. A method according to claim 1in which the sequence is an audio sequence.
 7. A video recording storedon a carrier, comprising a plurality of encoded versions of the samevideo sequence, wherein each version comprises a plurality of discreteportions of data and each version corresponds to a respective differentdegree of compression; and for each discrete portion of each version andfor each of a plurality of nominal transmitting rates, a maximum errorvalue for that portion, being the maximum of (a) the value of a timingerror that would occur were that portion to be sent at the respectivenominal rate; and (b) the values of a timing error that would occur werethat portion and any number of subsequent portions subsequent thereto tobe sent at the respective nominal rate.
 8. An audio recording stored ona carrier, comprising a plurality of encoded versions of the same audiosequence, wherein each version comprises a plurality of discreteportions of data and each version corresponds to a respective differentdegree of compression; and for each discrete portion of each version andfor each of a plurality of nominal transmitting rates, a maximum errorvalue for that portion, being the maximum of (a) the value of a timingerror that would occur were that portion to be sent at the respectivenominal rate; and (b) the values of a timing error that would occur werethat portion and any number of subsequent portions subsequent thereto tobe sent at the respective nominal rate.
 9. An apparatus for transmittingan encoded sequence over a network to a terminal, comprising a storestoring a plurality of encoded versions of the same sequence, whereineach version comprises a plurality of discrete portions of data and eachversion corresponds to a respective different degree of compression; atransmitter; and control means operable to receive data as to the datarate permitted by the network and data as to the state of a receivingbuffer at the terminal and, for at least one candidate version, tocompute in respect of at least one discrete portion thereof as yetunsent the maximum value of a timing error that would occur were anynumber of portions starting with that portion to be sent at thepermitted rate, to compare the determined maximum error values with thebuffer state and to select one of said versions for transmission, independence on the results of said comparisons.
 10. An apparatus fortransmitting an encoded sequence over a network to a terminal,comprising a store storing a plurality of encoded versions of the samesequence, wherein each version comprises a plurality of discreteportions of data and each version corresponds to a respective differentdegree of compression, each version including, for each of a pluralityof nominal transmitting rates, in respect of at least one discreteportion thereof, the maximum value of a timing error that would occurwere any number of portions starting with that portion to be sent at therespective nominal rate; a transmitter; and control means for receivingdata as to the data rate permitted by the network and data as to thestate of a receiving buffer at the terminal and, for at least onecandidate version, to use the permitted data rate and the stored maximumerror values to estimate a respective maximum error value correspondingto said permitted data rate, to compare the estimated maximum errorvalues with the buffer state and to select one of said versions fortransmission, in dependence on the results of said comparisons.